TUTOR: Samantha Dawson
LECTURER: Patricia Menéndez
“The World Bank Group and LinkedIn have created the Digital Data for Development collaboration to support innovative policy decisions as developing countries grapple with a rapidly changing global economy. With hundreds of millions of members worldwide, LinkedIn has the potential to offer a new, timely, and granular source of data about emerging industries, workers’ changing skills composition and how they’re engaging with labor markets globally.” ~ 1
This collaboration enables government and policy makers to drive better policy implementations, thus creating opportunities to the global work force. The data represents LinkedIn members’ data based on four metrics: Industry Employment Shifts, Talent Migration, Industry Skills Needs and Skills Penetration. The records in the data represent over 100 countries having a distribution across six major industry sectors(representing 148 industries): Financial Services, Professional Services, Information & Communication Technology (ICT), the Arts & Creative Industries, Manufacturing, and Mining/Quarrying and possessing skills within the over 50,000 distinct, standardized skills classified by LinkedIn into 249 skill groups, further categorised as: Business Skills, Disruptive Tech Skills, Soft Skills, Specialized Industry Skills and Tech Skills.
The data is referenced below in Section 4.1.
For our project, we will be working on the following questions.
“The Industry Skills Needs metric captures which skills are most likely to be added to a member’s profile in one industry compared to other industries. It’s calculated using an adapted version of a text mining technique called Term Frequency - Inverse Document Frequency (TF-IDF). This method gives more weight to a skill for an industry if more members in the industry list the skill on their profiles and the skill is more unique to the industry. The skills included are those added while a member holds a particular occupation (i.e. the skill flow approach). While the skill flow approach creates a trade-off whereby long-held basic skills, such as Microsoft Office being given a lesser weight, the approach is shown to be stronger at identifying the latest emerging skills in a specific industry than including all historical skills that are added during prior occupations. On balance, since the objective of this metric is to detect the latest skills needs, a skill flow approach is adopted.” ~2
The most common skill category across different sections is reported as Business Skills.
| skill_group_category | Arts, entertainment and recreation | Financial and insurance activities | Information and communication | Manufacturing | Mining and quarrying | Professional scientific and technical activities |
|---|---|---|---|---|---|---|
| Specialized Industry Skills | 266 | 5 | 185 | 228 | 39 | 387 |
| Tech Skills | 118 | 26 | 307 | 88 | 10 | 205 |
| Soft Skills | 83 | 82 | 104 | 151 | 25 | 202 |
| Business Skills | 32 | 184 | 138 | 215 | 26 | 273 |
| Disruptive Tech Skills | 1 | 3 | 66 | 18 | NA | 33 |
| isic_section_name | skill_group_category | n |
|---|---|---|
| Arts, entertainment and recreation | Specialized Industry Skills | 266 |
| Financial and insurance activities | Business Skills | 184 |
| Information and communication | Tech Skills | 307 |
| Manufacturing | Specialized Industry Skills | 228 |
| Mining and quarrying | Specialized Industry Skills | 39 |
| Professional scientific and technical activities | Specialized Industry Skills | 387 |
Figure 2.1: count of different skills
The TF-IDF text mining technique used in calculation for the skill rank includes the skills that are more unique to one industry than any other. For this reason, for each industry section, Specialized Industry Skills group count is the highest. Specialized Industry Skills are the most common skill category in 4 out of 6 industry sections. However, Information and Communication, has more of Tech Skills, this is owing to the fact that most of the Tech Skills like Microsoft Office are basic across all industries and hence are not categorized as Specialized Industry Skills for ICT. Business Skills(e.g, ) for financial and insurance activities is reasoned in the same manner as above.
Figure 2.2: percentage of different skills
The above 100% stacked bar chart 2.2 shows the skill category distribution within each industry section. While 4 out of 6 industry sections have a similar skill category distribution, Financial & Insurance Activities and Arts, Entertainment & Recreation have a rather different skill category distribution. This is because Arts, Entertainment & Recreation is a field in which each talent is a skill and thus Specialized Industry Skills(53%)!!! Financial & Insurance Activities commands Soft Skills and Business Skills(61%). Disruptive Tech Skills are not possessed by members belonging to Mining and quarrying.
Overall,specialized Industry Skills are the most common skill in professional scientific and technical activities. While business skills are the most important for people to acquire in financial and insurance activities.
“The Skill Penetration metric looks at how many skills from each of LinkedIn’s skill groups (see”Notes" tab) appear among the top 30 skills for each occupation in an industry. For example, if 3 of 30 skills for Data Scientists in the Information Services industry fall into the Artificial Intelligence skill group, Artificial Intelligence has a 10% penetration for Data Scientists in Information Services. These penetration rates are averaged across occupations to derive the industry averages reported. It is likely this metric is best at capturing skill penetration across tradable and knowledge-intensive sectors. For example, it may under-estimate the adoption of AI in. Manufacturing, since LinkedIn members are less likely to be in this sector compared to others." 2
| skill_group_name | isic_section_name | industry_name | |
|---|---|---|---|
| Specialized Industry Skills | Music | Arts, entertainment and recreation | Music |
| Tech Skills | Graphic Design | Professional scientific and technical activities | Graphic Design |
| Business Skills | Insurance | Financial and insurance activities | Insurance |
| Soft Skills | Writing | Information and communication | Writing & Editing |
| Disruptive Tech Skills | Development Tools | Information and communication | Computer Software |
The table 2.2 demonstrates that music industry has the highest skill penetration rate for skill groups among all industries.
Figure 2.6: The penetration rate for different industry
The figure 2.6 shows that music industry ranks first with the penetration rate of 25% on average in a time span of five years, graphic design ranked the second (22%), insurance ranked the third(9%) and writing & editing ranked the third (8%). In addition, aviation & aerospace ranked last in 2015(3%), but were replaced by computer software(5%) in the following years.
Figure 2.7: Change for skill peneration rate
The figure 2.7 deceits that specialized industry skills ranking first which peaked at 27% in 2017 had fluctuation during the time period while tech skills(2nd, average 22%) keep steadily. And soft skills had a slight climb until exceeding business skills in mid-2017(8%) which had a downward trend. However, disruptive tech skills ranks the last with 5% on average.
From the two figures, it is not hard to find that if workers want to enter an industry with a strong professional field, they are often required to master a unique and high threshold skill, such as music and graphic design. While industries with low technical penetration may require more alternative skills due to the fragmentation of the industry. Therefore it is true that the skill penetration rate of an industry is effected by various factors such as the degree of internal segmentation of the industry and the industry itself.
In general, hundreds of skills can be categorized by five common skills. Undoubtedly the specialized industry skills and tech skills has the higher rate which meet the requirements of industry development. Interestingly, with the advent of the era of big data and technology, the importance of many traditional skills like business skill has gradually declined, as shown in the decreasing penetration rate, which means that they are more replaceable in the industry and therefore no longer unique.
The employment growth data represents the overall rate of change of employment between a pair of consecutive years for an industry, across 2015 to 2019. This rate of change is called “growth rate” which is measured by the percentage change in the number of employees for that industry .The sample of LinkedIn members is limited to those that have a company registered on LinkedIn on their profile. For any year, the number of employees working in an industry is the cumulative sum of the shift in the employed industry of the LinkedIn members, that is, the sum of the linked profiles with no shift in industry and the difference of the number of employees entering that industry and the number of employees leaving that industry. For example, if an industry has for the year 2014, 1000 employees and 100 employees enter this industry and 50 employees leave this industry in 2015, then the number of employees in this industry for the year 2015 is 1000 + 100 - 50 and the growth rate in 2015 is 0.05%. The formula for growth rate is, \[\begin{equation} growth rate = (membercount_i-membercount_j/membercount_i)*100 \end{equation}\]
The growth is described with respect to variables such as, country_code, country_name, wb_region, wb_income, isic_section_index, isic_section_name, industry_id, industry_name, year, and growth_rate. However, the original data is in a wider format with growth rates for different years represented in the same row. As such, in accordance with the tidy data definition, the data was transformed into longer format and hence cleaned. For the question, records are filtered for the appropriate industry section for each region and then analysis is done for industries within that section.
The average growth rate is calculated for industry sections in each region and the industry sections with highest and lowest average growth rate over all the years is reported in Table 2.3 and Table 2.4 respectively.
| Region | Industry_section | Avg_growth_rate |
|---|---|---|
| East Asia & Pacific | Financial and insurance activities | 0.026 |
| Europe & Central Asia | Financial and insurance activities | 0.013 |
| Latin America & Caribbean | Financial and insurance activities | -0.003 |
| Middle East & North Africa | Mining and quarrying | 0.008 |
| North America | Financial and insurance activities | 0.026 |
| South Asia | Manufacturing | -0.006 |
| South Asia | Mining and quarrying | -0.006 |
| Sub-Saharan Africa | Manufacturing | 0.006 |
| Region | Industry_section | Avg_growth_rate |
|---|---|---|
| East Asia & Pacific | Mining and quarrying | 0.002 |
| Europe & Central Asia | Professional scientific and technical activities | 0.002 |
| North America | Mining and quarrying | 0.001 |
| Sub-Saharan Africa | Information and communication | -0.009 |
| Latin America & Caribbean | Information and communication | -0.016 |
| Middle East & North Africa | Information and communication | -0.017 |
| South Asia | Information and communication | -0.019 |
However, owing to several factors such as environmental resources, workforce and infrastructure, the economies of regions are dependent on specific industries. For example, Middle East & North Africa have huge fossil fuel deposits, and hence excel in Oil and Mining, while North America has the infrastructure and workforce for Information and communication industries. The Table 2.5 shows the the industry sections and the regions in which they had highest growth.
Deeper Insights can be made from 2.9 and 2.10
| Industry_Section | Region | Avg_growth_rate |
|---|---|---|
| Financial and insurance activities | East Asia & Pacific | 0.026 |
| Arts, entertainment and recreation | East Asia & Pacific | 0.008 |
| Mining and quarrying | Europe & Central Asia | 0.008 |
| Mining and quarrying | Middle East & North Africa | 0.008 |
| Financial and insurance activities | North America | 0.026 |
| Information and communication | North America | 0.022 |
| Professional scientific and technical activities | North America | 0.014 |
| Manufacturing | North America | 0.013 |
The presence of industries in the regions is shown in Figure 2.8.
Figure 2.8: Presence of industries in each region
East Asia & Pacific, North America and Europe & Central Asia have been growing in terms of employment with Financial and insurance activities being the most significant employer.
Industries in South Asia and Latin America & Caribbean had only contraction, with industries under the section Manufacturing and Mining and quarrying being the least affected. In Sub-Saharan Africa other than Manufacturing all other industries have been declining in terms of employment.
Information and communication has been contracting in Sub-Saharan Africa, Latin America & Caribbean, Middle East & North Africa and South Asia which otherwise has a tremendous scope in North America.
North America, East Asia & Pacific, and Europe & Central Asia are the regions where all industries upgraded.
North America has been the leader in all Financial and insurance activities,
Information and communication, Professional scientific and technical activities, Manufacturing whose biggest competitor is East Asia & Pacific.
Mining and quarrying, however, retains a strong position in Middle East & North Africa.
Figure 2.9: Region: Industry Sections
Figure 2.10: Industry Sections: Region
The selection of an industry section for a region is based on the Table 2.3. A comprehensive analysis is made on the industries that fall in these industry sections. The number of industries in each industry section is given in Figure 2.11
Figure 2.11: Industry Count within each Section
The regions North America, East Asia & Pacific, and Europe & Central Asia have a similar distribution of the growth rates for industries in Financial and insurance activities. Industries relating to investments have a growth rate[0.03,0.05] far exceeding other industries within this field. Banking, however remained in place. It is interesting to note that in the Middle East, Oil and Energy saw a decline. These comparisons have been made in Figure 2.12. The aggregated growth rate for each region is made in the time series graphs 2.14 .
Figure 2.12: Avg. growth of an industry within a region w.r.t best industry section
Each of the time series graphs below represents the cumulative averages for the growth rates of industry sections. The regions having the same industry sections are compared in each graph. The growth rate for Mining and quarrying in South Asia has been declining below whereas in Middle East & North Africa it has seen a steady growth . North America and East Asia & Pacific are close competitors in Financial and insurance activities with North America beating East Asia & Pacific in the recent times. The growth rate for Manufacturing is a similar trend as the Mining and quarrying where steady growth is observed in Sub-Saharan Africa.
The trend of industries within each section is represented in the Figure 2.15.
Figure 2.13: Time Series: Aggregated Growth Rate
Figure 2.13: Time Series: Aggregated Growth Rate
Figure 2.13: Time Series: Aggregated Growth Rate
Figure 2.14: Time Series: Aggregated Growth Rate
Figure 2.15: Time Series: Industry Growth Rate
The presence of skill categories in industry sections is shown in Figure 2.16 and the count of skills within each skilll category is displayed. Specialized Industry skills have a very large difference as compared to other skills owing to the fact that these skill categories are most unique in an industry. Business skills have a presence in many industries.
Figure 2.16: skill categories in industry sections
There exists no relationship between skill group rank and skill group penetration rate and for some industries, penetration rate is higher where there is no growth or little growth, thus suggesting that employees incorporate more skills. No relationship is determined.
Figure 2.17: Skill Group Rank vs Skill Group Pentration Rate
Figure 2.18: Industry Growth Rate and Skill Group Pentration Rate
The network below 2.19 shows the relationship between industry sections and skill categories wighted by the mean rank of these skills. Specialized Industry Skills have the highest rank across all industries. However, Financial and insurance activities demand more of Business skills. Business skills have a fair rank across industries. Tech skills and soft skills are ranked well for all industries; tech skills are more important to Information and communication whereas soft skills are important to manufacturing. Disruptive tech skills are however ranked highly only for Information and communication, manufacturing and professional, scientific and technical activities. It is to be noted that the count of combination of industry sections and skill categories in the observations(Figure 2.16) gives a similar result as the ranked network analysis.
Detailed network of industries and skills weighed by the skill group ranks for every industry section is present in subsequent graphs.
Figure 2.19: Network: Industry Section and Skill Category
Figure 2.20: Mining and quarrying
Figure 2.21: Manufacturing
Figure 2.22: Information and communication
Figure 2.23: Financial and insurance activities
Figure 2.24: Professional scientific and technical activities
Figure 2.25: Arts, entertainment and recreation
Migration rate is the net flows(arrivals - departures) normalized based on the member count in the target country multiplied by 10000. A positive migration is when the arrivals are greater than the departures and vice-versa. The migration rate for the countries averaged over all industries and years is shown in the map 2.26
| country_name | average_migration_rate |
|---|---|
| Luxembourg | 765.3817 |
| United Arab Emirates | 442.7116 |
| Malta | 396.6229 |
| Estonia | 347.1595 |
| Cyprus | 342.0833 |
| Qatar | 332.0523 |
| Panama | 283.6780 |
| Myanmar | 258.0705 |
| Kuwait | 237.3493 |
| Mali | 237.1740 |
| Switzerland | 233.6345 |
| Burkina Faso | 220.4540 |
| Saudi Arabia | 208.4615 |
| New Zealand | 197.5190 |
| Bahrain | 195.1179 |
| Ireland | 182.0494 |
| Singapore | 178.8927 |
| Rwanda | 175.4360 |
| Germany | 171.9108 |
| Papua New Guinea | 169.6560 |
| Japan | 168.5637 |
| Congo, Dem. Rep. | 161.9225 |
| Zambia | 151.8496 |
| Georgia | 150.8675 |
| Australia | 142.9156 |
| Austria | 137.7601 |
| Canada | 133.5462 |
| Chile | 119.2267 |
| Czech Republic | 118.9811 |
| Thailand | 115.2884 |
Figure 2.26: Map: Migration Rate of Countries
A network depicting the highest migration rate for a base country in shown below. This means the highest number of people that migrated to a country. The network is weighted on the average migration rate over the years. The two major clusters, the United States and India suggest that most most of people from most countries migrate to the United States of America. However,for India these might be the returning people who migrated a few years ago to the base countries. We can also see that the migration linkage is also dependent on the geographical and historical ties of the countries. For example, Venezuela is target country for the countries in Latin America and Caribbean, Hong Kong to China, West Bank and Gaza to Israel.
Figure 2.27: Highest Migration Rate Selected: Base Country to Target Country
Figure 2.28: Avg growth of the best industry within in a country w.r.t region
The figure 2.28 take us further step by adding another layer of Country to the ??.
Figure 2.29: Trend of best industry within in a country w.r.t region
The figure 2.29 provides a broader insight to the figure 2.28 by showing the trend of grqwth in each Country in a span of 5 years from 2015-2019.
Mostly every country showed s steady graph with some exceptions like:
This analysis report harnesses the dynamic, fast-growing LinkedIn dataset, which covers more than 100 countries, to derive insights about the metrics: skills, industries and migration trends of this modern world. Linked profiles have data that is valid in real time as the members tend to keep their career profiles updated. This kind of data is unlikely to be reflected in government statistics.
“LinkedIn data have unique strengths in that they enable new insights into the emerging digital sectors and skills, with near real-time updates that are unlikely to be reflected in government statistics. Certain tradable and knowledge-intensive sectors also have good coverage across income levels and geographic locations, which allows for global benchmarking. In this manner, it may from the outset serve as a complementary dataset to other government statistics. With the growing use of LinkedIn, these data can become increasingly relevant for developing countries around the globe.” 5
The data provided by The LinkedIn-World Bank Digital Data for Development is a cleaned data set which only requires to be adjusted in the wider or longer format based on the analysis question. In this report a comprehensive analysis was done with respect to these metrics on the higher level of classification: the skill group categories, industry sections and the world bank classified regions to gain an overall knowledge about the shifts in the trends of these metrics. Each question section discussed the shifts in these metrics to bring forward this knowledge and specific details were listed in the tables. Some complex networks were plotted to have a visual representation of the relationship between the skills and the industries to understand the relevance of a skill to an industry. The growth of the industries was studied with respect to the changes in its member population.
Specialized Industry Skills have the highest rank across all industries and Business, Tech skills were found to be common across all industries and were ranked similarly. Industries were categorized depending their growth rates and were mapped to different regions. This mapping summarized that North America leaded in terms of employment in several industries including Financial and insurance activities, Information and communication, Professional scientific and technical activities and Manufacturing and Financial and insurance activities was the highest. Again, the business skills and tech skills were highly ranked for this field.
The migration rates was studied which revealed that the United States is a popular migration destination from all over the world. In general, members possess a diverse set of skills and the common skills, business and tech skills, are applicable to all linked in members. This commonness compromises the rank of these skills. Hundreds of skills are be categorized into five skill categories. Undoubtedly, the specialized industry skills and tech skills have the higher rate which meet the requirements of industry development. Interestingly, with the advent of the era of big data and technology, the importance of many traditional skills has gradually declined, as shown in the decreasing penetration rate, which means that they are more replaceable in the industry and therefore no longer unique. However, these skills are basic and must be possessed in this modern era and other skills categories are industry specific additions.
The LinkedIn data provides data that brings out the generalized patterns and individual characteristics of industries and LinkedIn members in the developed countries, especially in the tradable, technology, and digital sectors.. However, this dataset has a limitation that the population of the developing countries in non-tradable, non-digital is under-represented.
The LinkedIn-World Bank Digital Data for Development:Industry Jobs and Skills Trends - About
The World Bank: Industry Skills Needs Dataset(3500 X 7), Skill Penetration Dataset(20780 X 7)
The World Bank: Talent Migration Dataset(Industry Migration-5295 X 13)
The World Bank: Industry Employment Shifts Dataset(7335 X 13)
R Core Team (2021) Xie (2020) Tierney and Lincoln (2021) Wickham et al. (2021) Vanderkam et al. (2018) Wickham (2021a) Hijmans (2019) Kahle, Wickham, and Jackson (2019) Wickham et al. (2020) Pedersen (2021) Slowikowski (2021) Arnold (2021) Müller (2020) file. (2020) Zhu (2021) Xie (2021) Spinu, Grolemund, and Wickham (2021) Brownrigg (2018) Tierney et al. (2020) Pedersen (2020) Sievert et al. (2021) Henry and Wickham (2020) Ryan and Ulrich (2020a) Wickham and Hester (2020) Wickham and Bryan (2019) Therneau and Atkinson (2019) Wickham (2019a) Müller and Wickham (2021) Wickham (2021b) Robinson and Silge (2021) Wickham (2019b) Ulrich (2020) Tierney (2019) Ryan and Ulrich (2020b) Zeileis, Grothendieck, and Ryan (2021) Xie (2016) Kahle and Wickham (2013) Wickham (2016) Csardi and Nepusz (2006) Xie (2015) Xie (2014) Grolemund and Wickham (2011) Sievert (2020) Silge and Robinson (2016) Wickham et al. (2019) Tierney (2017) Zeileis and Grothendieck (2005)